Programmable digital image processor

ABSTRACT

A programmable image transform system has a programmable addressing and arithmetic blocks. In the programmable addressing block, an input address generator has an input addressing microsequencer and an input addressing memory that stores an input addressing procedure. The microsequencer executes the input addressing procedure to generate addresses from which to request image data. In the programmable arithmetic block, an arithmetic block memory stores an image processing procedure and a microsequencer executes the image processing procedure using the image data to generate transformed image data.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.60/159,000, entitled “Programmable Image Transform Processor,” filedOct. 7, 1999, that is incorporated by reference.

BACKGROUND OF THE INVENTION

U.S. Patent application, titled “Programmable Image Transform Processorfor a Digital Camera,” Ser. No. 09/188,871, filed Nov. 9, 1998,incorporated by reference.

U.S. Patent application, titled “Programmable Timing Generator for aDigital Camera,” Ser. No. 09/188,831, filed Nov. 9, 1998, isincorporated by reference.

U.S. Patent application, titled “Programmable Display Controller for aDigital Camera,” Ser. No. 09/188,996, filed Nov. 9, 1998, isincorporated by reference.

1. Technical Field

The invention relates generally to digital image processing, andparticularly to a programmable image transform processor for digitalimage processing.

2. Related Art

In photographic cameras, the image-forming light is sensed and recordeddirectly on film. Unlike photographic cameras, the electronic stillcamera uses an electronic image sensor to sense the image-forming lightand a separate recording medium to record and store the picture. Becausethe electronic still camera uses digital technology, the electronicstill camera is a type of digital camera.

Typically the electronic image sensor in a digital camera is asolid-state device such as a charge-coupled device (CCD), chargeinjected device (CID) or a complimentary metal oxide semiconductor(CMOS) device. The image sensor connects to electronic interfacecircuitry which connects to a storage device and, optionally, to adisplay. A typical image sensor has many cells or pixels arranged alongvertical and horizontal dimensions in a matrix. In response to light,the cells generate a charge or voltage which represents imageinformation. The image sensor senses an image and stores imageinformation, i.e., a charge or voltage, corresponding to the sensedlight in the cells. Image sensors are made in many sizes such as, e.g.,400×300, 640×480, 1024×768 and 4096×4096 pixels. The image informationstored in the cells is output serially from the image sensor using anarrangement of shift registers. The shift registers are arranged alongvertical and horizontal dimensions and are coupled to the cells. Thecells and shift registers require timing, or clock signals, havingspecific timing requirements, to output the image information. Each typeof image sensor has its own unique timing requirements. Typically, asingle image sensor requires many clock signals to control the flow ofimage information in both the horizontal and vertical dimensions. Theclock signals must be synchronized. For example, to output imageinformation from a 640×480 CCD requires 480 vertical shifts and 640horizontal shifts for each vertical shift. Within a single dimension,the clock signals to control the flow of image information havedifferent phases that must be synchronized. Furthermore, shifting theinformation out of the image sensor requires timing signals tosynchronize the image sensor's operation with an analog signal processor(ASP) and an analog-to-digital (A/D) converter.

The image information sensed by each cell is also called a pixel. Forexample, a 640×480 CCD has about 307,200 pixels. After being convertedto digital form, the image information (image data) is stored in amemory, typically an image memory. Image sensors having a larger numbersof cells produce higher quality images; however, the more pixelinformation that is available relates to the amount of processing andmemory resources required to process the pixel information.

Typically, a digital signal processor processes the image data toimprove the quality of the image. Various algorithms well-known in theart are used to improve the image quality of the image data. Becausethere is such a large amount of image data, the image data may becompressed before storage in a storage medium or memory.

Color imaging increases the complexity of processing the image data. Inone method, the image sensor has a geometric arrangement of cells torespond to three colors, e.g., red, green and blue. Since each cellsenses a particular color, various algorithms are used to interpolatethe missing color information. Alternatively, two or more image sensorshaving different color sensitivity may be utilized and the imageinformation combined.

In digital cameras, processing the data takes time. Analog image datafrom the image sensor is processed via the analog signal processor,converted into image data by the analog-to-digital converter and storedin memory. Furthermore, a digital signal processor processes the rawimage data to improve the quality of the image. For color images thatutilize a single image sensor, “missing” pixel data values must beinterpolated and require even more processing time. Still images arefurther processed to compensate and correct for other errors introducedby the optical system and the image sensor. The compression of the imagedata adds even more time. The time required to acquire, process andcompress the image data causes an unacceptable delay when acquiringconsecutive images. The delay can take several seconds. This delay is aproblem for photographers who need a continuous shooting capability tophotograph a sequence of images in quick succession. Therefore a processand apparatus are needed to reduce the delay between consecutivepictures.

Typically, a digital camera has hardware that implements a singledigital image processing procedure or algorithm. If the procedure ischanged, the hardware must be redesigned, which is time consuming andexpensive. Therefore, there is a need in the art for a digital imageprocessing procedure or device that is easily and quickly modified andthat supports numerous digital signal processing procedures using thesame hardware. The digital image processing procedure or device shouldalso minimize the processing time to allow consecutive pictures to betaken in quick succession.

In addition, depending on the environmental factors, such as lighting,the image processing algorithm should be selected or modified to producethe desired image quality. Furthermore, there is a need to dynamicallymodify the image processing algorithm during the image acquisitionprocess.

As the size of the image sensors increases, the amount of imageinformation to be processed increases. In addition, as image processingalgorithms become increasingly sophisticated, complex processing of theimage data consumes more time. Therefore, there is a need to reduce theimage processing time.

SUMMARY

The programmable image transform system may be broadly conceptualized asa device that separates address generation from arithmetic manipulation,thus improving the overall efficiency of the device while reducing thetime needed to perform image processing. For example, an image transformprocessor that processes digital images may utilize an architecture thatincludes a programmable arithmetic processor and a programmable inputaddresser. The programmable arithmetic processor may be capable ofreceiving digital image data from a memory, such as a read only memory(ROM), electronic erasable programmable read only memory (EEPROM), flashmemory or non-volatile memory, over a data bus for processing. Theprogrammable input addresser controls the transfer of image data fromthe memory to a programmable arithmetic processor. The programmableinput addresser provides: (i) a memory address to a read address buscoupled between the programmable addresser and the memory, and (ii) astorage address to the programmable arithmetic processor. The memoryaddress identifies a location of the digital image data within thememory. The storage address identifies a local buffer within theprogrammable arithmetic processor for storage of the digital image data.

The invention also relates to retrieval and storage of image data into amemory while other image data is being processed. The retrieved imagedata is placed in a set of local buffers. To increase the speed of imageprocessing, a single-instruction multiple-data (SIMD) processorprocesses the image data in the set of local buffers and outputs theprocessed image data to another set of local buffers. For example, in animage transform processor having buffers, a first portion of input imagedata is provided in a first one of the buffers. A first processingoperation is performed on the first portion of the input image data todefine a first processed image data. The first processed image data isstored in a second buffer. A second processing operation is performed onthe first processed image data to define a second processed image data.While the second processing operation is performed on the firstprocessed image data, a second portion of the input image data isprovided in the first buffer.

The invention also provides for using the image transform processor forprocessing video or other real-time data streams. The image transformprocessor has four buffer that are used for storing the video orreal-time data. First and second levels of buffers are alternately usedfor fetching input data, while third and fourth levels of buffers arealternately used for storing output data. Thus, image data can be input,processed and output in every clock cycle.

Other systems, methods, features and advantages of the invention will beor will become apparent to one with skill in the art upon examination ofthe following figures and detailed description. It is intended that allsuch additional systems, methods, features and advantages be includedwithin this description, be within the scope of the invention, and beprotected by the accompanying claims.

BRIEF DESCRIPTION OF THE FIGURES

The components in the figures are not necessarily to scale, emphasisinstead being placed upon clearly illustrating the principles of theinvention. Moreover, in the figures, like reference numerals designatecorresponding parts throughout the different views.

FIG. 1 is a block diagram of an electronic digital camera embodying anexemplary image transform processor.

FIG. 2 is a block diagram of the digital camera of FIG. 1.

FIG. 3 is a diagram of an exemplary image sensor suitable for use withthe image transform processor.

FIG. 4 is a block diagram of a preferred embodiment of the imagetransform processor of FIG. 2.

FIG. 5 is a block diagram of a programmable block addresser of the imagetransform processor of FIG. 4.

FIG. 6 is a block diagram of the topology of the arithmetic processingblock of FIG. 4.

FIGS. 7A and 7B are exemplary timing diagrams showing the overlapping ofdata retrieval, data processing, and data storage operations in thearithmetic processing block of FIGS. 4 and 6.

FIG. 8 is a diagram of an exemplary two-dimensional array of workingblocks.

FIG. 9 is a diagram of exemplary image data showing the pixel blocks ofan exemplary working block.

FIG. 10 is an example of a working block that includes adjacent pixelblocks in the image data.

FIG. 11 is a diagram showing overlapping working blocks in the imagedata.

FIGS. 12A and 12B are examples of working blocks that include dispersedpixel blocks in the image data.

FIG. 13 is a block diagram of the buffer owner register and next ownerregister of the arithmetic processing block of FIG. 6.

FIG. 14 is a block diagram of an input buffer controller of FIG. 4.

FIG. 15 is a block diagram of a SIMD processor pipeline.

FIG. 16 is a block diagram of a SIMD processor of the arithmeticprocessing block of FIGS. 4 and 6.

FIG. 17 is a block diagram of pointer configurations used by aninstruction word.

FIG. 18 is a diagram of a circuit that generates an effective addressfor an instruction.

FIG. 19 is a block diagram of a multiplexor/latch stage of the SIMDprocessor pipeline of FIG. 15.

FIG. 20 is a block diagram of an arithmetic stage of the SIMD processorpipeline of FIG. 15.

FIG. 21 is a block diagram of a descale/write stage of the SIMDprocessor of FIG. 15.

FIG. 22 is a block diagram of an accumulator descaler of the arithmeticstage of the processing element of FIG. 21.

FIG. 23 is a block diagram of an arithmetic logic unit descaler of thearithmetic stage of the processing element of FIG. 20.

FIG. 24 is a block diagram showing the expandable topology of thearithmetic processing block of FIG. 6.

FIG. 25 is a block diagram of an arithmetic processing block of FIG. 6having multiple master controllers.

FIG. 26 is a flow diagram of an exemplary image transform process of theimage transform processor of FIG. 4.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

In FIG. 1, a block diagram of a digital camera 100 embodying the imagetransform processor is shown. A lens 102 transmits the image-forminglight 104 onto an electronic image sensor (image sensor) 106. The imagesensor 106 is in the digital camera and located at the focal plane ofthe lens. The image sensor is typically a charge-coupled device (CCD) ora complementary metal-oxide-semiconductor (CMOS) sensor. Image sensorsdiffer in the arrangement of the cells within the image sensor and thetype of charge readout. The image sensor 106 connects to electronicinterface circuitry 108. The electronic interface circuitry 108 alsoconnects to a storage device 110 and an optional display 112. Theelectronic interface circuitry 108 controls the storage device 110 andstores the image sensed by the image sensor 106. The storage device 110can include a tape drive, a disk drive, such as a floppy disk drive,hard disk drive, optical disk drive or magneto-optical disk drive, or anintegrated circuit card with RAM, DRAM, or EEPROM, or non-volatilememory. The storage device 110 may be inside the digital camera 100 orattached to the digital camera externally. The electronic interfacecircuitry 108 may also control the display 112 that displays the imagesensed by the image sensor 106. The display 112 can be inside thedigital camera or attached to the camera externally. The electronicinterface circuitry can operate the display 112 in either a viewfindermode or a review (i.e., stored image viewing mode).

In FIG. 2, a block diagram of the electronic interface circuitry of thedigital camera of FIG. 1 is shown. A microprocessor (RISC) 202 iscoupled to a memory controller 203 a, a programmable timing generator204, a frame capture processor 205, a programmable image transformprocessor 206, a storage medium 208 and a programmable displaycontroller 209. The memory controller 203 a is connected to a memory203. The programmable display controller 209 is coupled to a display210. The image sensor 106 is coupled to an analog signal processor (ASP)211 which connects to the analog to digital converter (A/D converter)212. The programmable timing generator 204 is coupled to the imagesensor 106, ASP 211, the A/D converter 212, the frame capture processor205, and the microprocessor (RISC) 202. The programmable image transformprocessor 206 and other elements read data from and write data to thememory 203 via the memory controller 203 a. Preferably, the memory 203is a high-speed DRAM used to store the digital image data. The A/Dconverter 212 supplies digital image data to the programmable imagetransform processor 206 that stores the data in the memory 203. Thetiming generator 204 supplies timing signals to the programmable imagetransform processor 206 and A/D converter 212 to synchronize thetransfer of digital image data between the A/D converter 212 and theframe capture processor 205. The frame capture processor 205 suppliesthe digital image data to the programmable image transform processor206. Alternately, the frame capture processor 205 stores the image datafrom the sensor directly to the memory 203, and the programmable imagetransform processor 206 fetches that data from the memory 203 forfurther processing. The frame capture processor 205 supports real-timewindowing, histogram, gamma, white balance and auto-focus functions.

The microprocessor (RISC) 202 executes a camera operation procedure thatis stored in memory 203. Alternatively the camera operation procedurecan be stored in a read-only-memory (ROM), or loaded into the memory 203from the storage medium 208. Further, in alternate embodiments, the RISCmicroprocessor may be substituted a different type of controller, suchas a typical microprocessor, digital signal processor, applicationspecific integrated circuit (ASIC), phase array logic (PAL), discretecircuits functioning as a controller. The camera operation procedurecomprises an image acquisition procedure. When a user presses astore-image button (not shown), the camera operation procedure causesthe image sensor 106 to acquire an image. The image acquisitionprocedure causes the microprocessor (RISC) 202 to control the timinggenerator 204 to generate vertical and horizontal clock signals for useby the image sensor 106. The image sensor 106 outputs image as a seriesof analog signals corresponding to the color and intensity of the imagesensed by each cell. The sensed image information is then sent to theASP 211 and to the A/D converter 212.

The ASP 211 processes the sensed image information before input to theA/D converter 212. For example, the ASP has a programmable amplifierwith adjustable gain, and also reduces or eliminates noise, such asreset noise, from the sensed image information using methods well knownto those in the art, such as correlation-double-sampling. The A/Dconverter 212 then converts the analog sensed image information intoimage data. In an alternative embodiment, the ASP 211 is absent and nopre-processing of the sensed image data occurs.

The image data is stored in memory 203. Execution of the cameraoperation procedure by the microprocessor (RISC) 202 causes the imagedata to be processed by the programmable image transform processor 206.The processed image data is compressed and recorded in memory 203, on astorage medium 208 or transferred to a programmable display controller209 for output to a display 210.

In FIG. 3, a block diagram of an exemplary image sensor 302 is shown.The image sensor 302 can be a CCD or CMOS device. The image sensor 302connects to the analog signal processor (ASP) 304 and the A/D converter306. The image sensor 302 has cells 308, vertical shift registers 312and a horizontal shift register 314. Each cell 308 absorbs light andconverts the light energy into an electrical charge. The amount ofcharge is a measure of the amount of light energy or radiation absorbedby the image sensor 302. The size of the image sensor 302 determines thequality of the image. The quality of the image improves as the number ofcells 308 increases. Image sensors are available in many sizes including400×300, 640×480, 1024×768, and 4096×4096 cells.

The components of the image sensor 302 are arranged along horizontal andvertical dimensions. An array 310 of cells 308 is arranged in thevertical dimension. The vertical shift register 312 has registerlocation 316 for storing the charge sensed by the cells 308. Each cell308 in the array of cells 310 connects to a corresponding registerlocation 316 in the vertical shift register 312.

Free charges move from regions of higher potential to regions of lowerpotential. By alternating the voltage on the electrodes (not shown)connected to the cells 308 and the register locations 316 and 318 of theshift registers 312 and 314 in proper phase, a charge packet, i.e., thecharge from the cell 308, can be moved from the cell 308 to a registerlocation 316 in the shift register 312. The charge packet is then movedfrom one register location to another register location in the shiftregisters 312 and 318 until finally output by the image sensor 302.

When appropriate voltages are applied to the cell 308 and thecorresponding register location 316 in the vertical shift register 312,the charge generated in the cell 308 is transferred out of the cell 308to the corresponding register location 316 in the vertical shiftregister 312. The programmable timing generator is programmed to outputtiming or clock signals to cause the transfer of the charge to occur atsynchronized times. When appropriate voltages are applied to adjacentelements of the vertical shift register 312, the charge is transferredfrom to the next registration location. The last element or output ofeach vertical shift register 312 connects to a corresponding registerlocation 318 in the horizontal shift register 314. When appropriatevoltages are applied to the last register location of the vertical shiftregister 312 and the corresponding register location 318 of thehorizontal shift register 314, the charge is transferred from thevertical shift register 312 to the horizontal shift register 314. Whenappropriate voltages are applied to adjacent register location of thehorizontal shift register 314, the charge is transferred from oneregister location to another register location until finally outputted.The output of the horizontal shift register 314 connects to the ASP 304via an output amplifier 320.

Color imaging is more complex. In one method, the image sensor 302 has ageometric arrangement of cells to respond to three colors, e.g., red,green and blue. Alternatively, two or more image sensors havingdifferent color sensitivity are used. The programmable image transformprocessor of the present invention works with both methods of colorimaging. The programmable image transform processor performs imagetransform operations on input data after it has been digitized by theA/D converter 306.

In FIG. 4, a block diagram of an embodiment of the programmable imagetransform processor (ITP) 206 of FIG. 2 is shown. Image transformationand compression operations, such as discrete wavelet transforms (DWT)and discrete cosine transforms (DCT) perform two main types ofcomputation: address calculation and arithmetic computation. Devicessuch as digital cameras store images, at least temporarily, insolid-state memory such as a DRAM. The memory is organized into pages ofimage data. To acquire image data from the memory, an address isgenerated. After generating the address and acquiring the desired imagedata, the image data is further manipulated. The ITP 206 separates theaddress calculation from the arithmetic computation using parallelhardware. The ITP collects input image data and output image data inbursts when accessing the same memory page.

The ITP 206 has inputs and outputs for connecting to a read address bus,a read data bus, a write address bus, a write data bus and controlsignals. The ITP 206 connects to the memory, the A/D converter, thetiming generator and the microprocessor (RISC). A DMA controller may beused to access the high speed image memory. The ITP 206 may a bedynamically configurable to provide many pipelined data processingpaths. In an addressing block 410, a data path mode register 412controls an input data multiplexor 414 and an output data multiplexor416 to control the flow of image data to and from a programmablearithmetic processing block 420. The programmable arithmetic processingblock 420 receives the image data, processes the image data and outputsthe processed image data. The microprocessor (RISC) of the digitalcamera loads the data path mode register 412 with a specified data flowpath information.

In response to the data flow path information being loaded in the datapath mode register 412, the input data multiplexor 414 supplies datafrom the microprocessor (RISC), from a frame capture processor, aHuffman decoder 422, and directly from the DRAM. The frame captureprocessor provides an analysis of the image data as it is received fromthe programmable timing generator. The Huffman decoder 422 decodescompressed image data that was stored using the Joint PhotographicsExperts Group (JPEG) compression format in the external memory.

In response to the data in the data path mode register 412, the outputdata multiplexor 416 outputs data from the microprocessor (RISC),processed image data from the programmable arithmetic processing block420, or encoded processed image data from a Huffman encoder 424. TheHuffman encoder 424 compresses data from the programmable arithmeticprocessing block 420 using a JPEG compression format.

Table one, below, summarizes the data flow for various data pathconfiguration settings of the data path mode bits of the data path moderegister 412. In table one, the term “ITPBUF” refers to the programmablearithmetic processing block 420, and in particular to local buffers inthe programmable arithmetic processing block 420. TABLE 1 Data PathConfiguration Settings Buffer Owners Data PE = 0, IBC = 1, Path OBC = 2,Mode RISC = 3 Mode Bits Data Flow L0 L1 L2 L3 Video 000100 FCP to ITPBUFto DRAM 1/0 1/0 0/2 0/2 Frame 000000 DRAM to ITPBUF to DRAM 1/0 1/0 0/20/2 Blend Process 000000 DRAM to ITPBUF to DRAM 1/0 0 0 0/2 Process/000001 DRAM to ITPBUF to HUFF to 1/0 0 0 0/2 Encode DRAM Decode/ 000010DRAM to ITPBUF to HUFF to 1/0 0 0 0/2 Process DRAM RISC/ 111000 RISC toITPBUF to RISC 3 3 3 3 RISC RISC 010000 DRAM to ITPBUF to RISC to 3/0 00 0/3 replace DRAM IBA RISC 001000 DRAM to RISC to ITPBUF to 3/0 0 0 0/3Replace DRAM OBA RISC 011000 DRAM to RISC to ITPBUF to 1/0 0 0 0/3replace RISC to DRAM IBA, OBA IBA help 101000 DRAM to ITPBUF to DRAM 3 33 3/2 RISC OBA help 110000 RISC to ITPBUF to DRAM 3 3 3 3/2 RISC IBA &100000 DRAM to ITPBUF to RISC to 1/3 3 3 3/2 OBA help ITPBUF to DRAMRISC

The programmable addressing block 410 generates addresses andcoordinates handshaking signals to retrieve image data from and to storedata to the external memory. Image data does not flow through theprogrammable addressing block 410 but flows to the local buffers of theprogrammable arithmetic processing block 420. The programmableaddressing block 410 supplies control signals to coordinate the transferof image data with the programmable arithmetic processing block 420.

The programmable addressing block 410 has an input addresser 430 and anoutput addresser 440. In the input addresser 430, an input blockaddresser (IBA) 442 provides addresses to a read address bus to requestdata from an external memory, such as a DRAM, using handshaking controlsignals, such as read address available (R_Address Avail) and readaddress acknowledge (R_Address Ack). An input buffer controller (IBC)444 supplies addressing information to the local buffers of theprogrammable arithmetic processing block 420 to store the requestedimage data from the external memory on a read data bus in buffers in theprogrammable arithmetic processing block 420 using handshaking signals.The handshaking signals are read data available signal (R_Data avail)and read data acknowledge signal (R_Data ack).

In the output addresser 440, an output block addresser (OBA) 446provides addresses to a write address bus to store data in the externalmemory using handshaking control signals such as write address available(W_Address Avail) and write address acknowledge (W_Address Ack). Anoutput buffer controller (OBC) 448 supplies addressing information tothe the local buffers of the programmable arithmetic processing block420 to transfer the image data from the local buffers of theprogrammable arithmetic processing block 420 to the external memory. Theoutput buffer controller 448 uses handshaking signals to retrieve theprocessed image data from the programmable arithmetic processing block420. The OBC 448 uses handshaking signals, such as write data availablesignal (W_Data avail) and write data acknowledge signal (W_Data ack), tocoordinate the transfer of data from the local buffers of theprogrammable arithmetic processing block 420 to the external memory.

The programmable arithmetic processing block 420 receives the imagedata, processes the image data and outputs the processed image data. ASIMD master controller 450 controls the operation of the programmablearithmetic processing block 420. Both the programmable arithmeticprocessing block 420 and the SIMD master controller 450 communicate withthe camera's microprocessor (RISC) 202 (FIG. 2).

The Addressers

The input block addresser (IBA) 442 and output block addresser (OBA) 446supply addresses to each address bus. The IBA 442 provides addresses ofrequested data to supply to the read data bus, i.e., data to be operatedon by the programmable arithmetic processing block 420. In particular,the IBA 442 generates DRAM memory word addresses for two-dimensionalblocks or lines of image data. The OBA 448 provides addresses ofprocessed data to write to the write data bus, such as image data to bestored in the image memory.

The Input Block Addresser

Referring to FIG. 5, the input block addresser 442 is configurable (i.e.programmable). The input block addresser 442 has a microsequencer 460, acontrol store or instruction memory 462, and pointer registers A, B, Cand D 464. The input block addresser 442 has four loop counters 466,four general purpose registers 468 and four pointer registers 464. Thepointer registers A, B, C and D 464 generate the input address which isoutput to the read address bus by the multiplexor 476. The input blockaddresser 442 also has a base page register 470 and a stack 472 that ispart of the control store 462 and a stack pointer 474. The control store432 is typically implemented using a static RAM array.

The microsequencer 460 is coupled to the control store 462 and thepointer registers 464 and generates the input data addresses to accessthe memory, such as a DRAM, storing the image data. The microsequencer460 stores the addresses in the pointer registers 464. The addresses inthe pointer registers 464 are utilized to access the DRAM memory. Datarequested by the input block addresser 442 is stored in a buffer in theprogrammable arithmetic processing block 420 (FIG. 4). A multiplexor 476selects the address in one of the pointer registers 464 to output to theread address bus based on commands executed by the microsequencer 460.

The control store 462 stores an input block address procedure 478 to beexecuted by the input addresser microsequencer 460. The input blockaddress procedure 478 has a sequence of address generation instructions.

The input block addresser 442 has a data request command to initiateread operations to the image memory and to supply an absolute address tothe read address bus. The microsequencer 460 can set a loop counter 466to generate the desired number of request/acknowledge cycles. Themicrosequencer 460 loads and decrements the loop counter 466. Themicrosequencer 460 has other instructions enabling values to be addedand subtracted from the pointer registers 464. Branching instructionscan be responsive to the loop counter 466 and conditions. Call andreturn instructions are used with the stack 472 and stack pointer 474.Push and pop instructions are also used to push and pop the values inthe general purpose registers 468, pointer registers 464, and loopcounter registers 466 on and off the stack 462. The general purposeregisters 468, pointer registers 464, and loop counter registers 466 canbe loaded from other general purpose 468 and pointer registers 464. Thecontents of the general purpose 468, pointer 464 and loop counter 466registers can be loaded with a constant value or added to each other.Table two describes a portion of the instruction set of themicrosequencer 460. TABLE 2 Input Addresser Microsequencer InstructionSet Instruction Description MADDPT The MADDPT instruction adds a valueto a specified pointer register. This instruction is similar to the DRQinstruction except that no data is requested. An immediate value rangingfrom zero to seven can be added to the specified pointer register, orthe contents of one of the general purpose registers can be added to thepointer register. MSUBPT This instruction subtracts a value from aspecified pointer register. An immediate value ranging from zero toseven can be subtracted from the specified pointer register, or thecontents of one of the general purpose registers can be subtracted fromthe pointer register. LOOP The loop instruction branches to a specifiedaddress when a specified loop counter register does not equal zero anddecrements the loop counter. LCI This instruction loads a loop counterregister with an immediate value. BR The branch instruction causes themicrosequencer to execute the instruction at a specified address. CALLThe call subroutine instruction calls a subroutine. The return addressis pushed onto the stack and the microsequencer's instruction pointer isloaded with a specified address. A stack pointer register is alsodecremented. LD The Load Source to Destination instruction loads aspecified destination register such as the one of the general purpose,pointer or loop counter registers from a specified source register suchas one of the general purpose or pointer registers. ADD The addinstruction adds the contents of the specified source and destinationregisters and stores the result in the destination register. PUSH Pushdecrements the stack pointer and writes the contents of the specifiedregister onto the stack. POP POP writes the data pointed to by the stackpointer from the stack onto the specified register and increments thestack pointer. INC Increments any specified register. LDMODE Loads theInput Block addresser's Mode register with a three- bit immediate value.The arithmetic block has a branch instruction that tests the state ofany one of the three bits. SET Sets the addresser's DONE flag in thecollector's interrupt register to signal the end of an operation. NOP Nooperation RET Return from subroutine pops the stored instruction addressfrom the stack and places the instruction address in themicrosequencer's instruction pointer. HALT The halt instruction stopsthe microsequencer from executing the program in the control store.

The microprocessor in the camera loads the input block addresser's 442control store 462 with the input block address procedure 478 for eachimage transform operation. In response to the microprocessor, themicrosequencer 460 begins executing the input block address procedure478 and generates the desired addresses Those skilled in the artrecognize that the output block addresser 446 and output buffercontroller 448 have similar components and operate in a similar mannerto the input block addresser 442 and input buffer controller 444.

The Output Block Addresser

The output block addresser 446 is a duplicate of the input blockaddresser 442 shown in FIG. 5 except that the output block addresser 446generates addresses for storing the data from the local buffers of theprogrammable arithmetic processing block 420 in the external memory. Thedescription for the configuration registers and microsequencer of theinput block addresser 442 applies to the output block addresser 446. Togenerate the addresses, the control store stores an output block addressprocedure instead of the input block address procedure. In analternative embodiment, the output block addresser 446 can havedifferent features from the input block addresser 442 and thereforewould not be a duplicate.

The Programmable Arithmetic Processing Block

In FIG. 5, the topology of the programmable arithmetic processing block420 of FIG. 4 is shown. In the arithmetic processing block 420, afour-by-five array of local buffers (LB) 500 is associated with fiveprocessing elements (PE) 510, 511, 512, 513 and 514. The local buffers500 are arranged in rows and columns. Two rows of local buffers 520 and521 function as input buffers to receive data in response to the inputbuffer controller 444 (FIG. 4). The other two rows of local buffers 522and 523 function as output buffers to output processed image data inresponse to the output buffer controller 448. Each column is referred toas a bank. The local buffers are designated as LB(X,Y), where X is thecolumn (bank) designation and Y is the row (bank) designation. Forexample, LB(00) refers to the local buffer in bank zero, row bank zero520. The RISC stores data to and reads data from each of the localbuffers 500. In one embodiment, each local buffer maybe implemented as a768 byte single-ported memory.

In each bank (Bank0-Bank5) of the four-by-five array, a respectiveprocessing element (PE) 510-514 is associated with the local buffers 500of that bank. However, another bank, bank five, that has no associatedPE, is included to provide boundary data for the PE four 514 of bankfour, such as when performing convolutional algorithms on the imagedata. In one embodiment, local buffer (LB) fifty-three may be omittedbecause level three is primarily used as a temporary buffer for theprocessed image data that is to be output by the output block addresser446 and output buffer controller 448.

Each PE 510-514 accesses image data from and stores image data in eachof the local buffers in that bank. Each processing element 510-514 alsoaccesses the image data in the local buffers 500 of the right adjacentbank, such as when performing convolutional algorithms. Because the SIMDmaster controller 450 simultaneously controls the operation of theprocessing elements 510-514 such that each processing element 510-514executes the same instruction. The SIMD master controller 450 has amemory that stores an image processing procedure that controls theoperation of the processing elements 510 and local buffers 500. Duringprocessing, blocks of data are continuously fetched from external memoryto the local buffers by the input block addresser 442 and input buffercontroller 444, or from the image sensor via the frame captureprocessor.

Many image processing algorithms can be decomposed into a series ofdiscrete phases, each performing a single step of the image processingalgorithm. In each step of the image processing procedure, the SIMDmaster controller 450 will read input data from one level of buffers,perform the computation, then store the image data resulting from thatcomputation to a different level of buffers. Simultaneously, additionalinput image data is loaded into another level of buffers, and the outputblock addresser 446 stores the results of the computation on a previousblock of image data from another level.

In FIG. 7A, an exemplary timing diagram of the local buffer pipeline isshown. In phase zero, at the start of processing, all local buffers 500(FIG. 6) are owned by the SIMD master controller 450 (FIG. 6). When theSIMD master controller 450 assigns ownership of the level zero buffersto the IBA/IBC, using the “assign level” instruction, the IBA/IBC loadsthe first block of input data into the buffers of level zero. When theload completes, ownership of the local buffers 500 of level zero isreturned to the SIMD master controller 450

In phase one, the processing elements read data from level zero (SIMDREAD), perform the first processing step, and store the result in thebuffers 500 of level one (SIMD WRITE). When the first processing stepcompletes, and the data in the level zero buffers is no longer needed,ownership of the buffers of level zero is transferred back to theIBA/IBC by the SIMD master controller 450 so that the next block ofinput image data can be fetched.

In phase two, the IBA/IBC loads image data in the buffers of level zero,the SIMD master controller 450 performs the next processing step byreading the buffers 500 of level one (SIMD READ), and writing to thebuffers of level two (SIMD WRITE). In phase three, the SIMD mastercontroller 450 performs the final processing step by reading the datafrom the buffers of level two (SIMD READ), and writing image data to thebuffers of level three (SIMD WRITE). When this processing step iscomplete, ownership of the buffers 500 of level three is returned to theOBA/OBC, so that the output image data can be stored in the externalmemory. When the OBA/OBC completes the transfer of the output image datato the external memory, ownership of the buffers 500 of level zero isreturned back to the SIMD. Meanwhile, the SIMD master controller 450begins processing the second block of input image data.

As shown in the example of FIG. 7B, for video processing or otherreal-time data stream, the buffers of level zero and level one arealternately used for fetching input image data, while the buffers oflevel two and level three are alternately used for storing output imagedata. In this way, image data is input, processed and output in everycycle.

The local buffers reduce the address range of the SIMD master controller450, reduce power consumption by minimizing the number of externalmemory accesses, and increase the efficiency by allowing long “burst”data transfers with the external memory. This topology also improves theoverall image processing performance without the cost of a complexcaching scheme by allowing data fetches and stores to occur in parallelwith image processing.

The combination of the block addressers (IBA, OBA), buffer controllers(IBC, OBC) and local buffers 500 allows image data to be transferred toand from the local buffers 500 in complex ways. Either in cooperationwith the block addresser or directly from the frame control processor,words of data are transferred to and from the local buffers by thebuffer controllers. The buffer controllers have several interconnectedcounters. A small register set within each buffer controller configuresthe range of the counters. The counters determine the order in which thelocal buffers 500 are addressed. By loading and executing a bufferaddresser procedure and configuring the registers of the buffercontrollers, data can be fetched in complex orderings from the externalmemory and be arranged in the local buffers for subsequent processing.

Referring to FIG. 8, working block columns (WBC) are shown. Referringalso to FIG. 9, to visualize how the block addressers and buffercontrollers operate, consider an eight-bit monochrome image, 584 pixelswide by 384 pixels high, which is to be divided into sixteen by sixteenpixel blocks. One pixel block 570 is provided to each processing elementfor processing. In some applications, the transfer of image data fromthe external memory to the local buffers is a copy betweenmulti-dimensional arrays. The image data in the external memory is alarge two-dimensional array with rows and columns of pixels. This largearray can also be represented with many two-dimensional arrays ofsixteen-by-eighty pixels, or five pixel blocks 580, or a WBC 580. TheWBC 580 is loaded into the SIMD master controller 450 (FIG. 6), one perlevel of buffers, for processing. For example, the working block 580 ofFIG. 8, has five pixel blocks 570 of image data and is loaded into localbuffers 00, 10, 20, 30 and 40 of level zero. More generally, a WBC is asubset of the image data that is distributed across a predeterminedsubset of the local buffers of a level. The WBC is also the unit of datatransfer between the external memory and the local buffers.

In the image data of FIG. 9, the first sixteen complete rows of imagedata, called a strip 590, is five-hundred eighty-four pixels wide bysixteen pixels high. The strip has thirty-seven pixel blocks, the lastof which is eight pixels wide, instead of sixteen. An exemplary set ofadjacent pixel blocks (shaded), making up an exemplary working block, isalso shown. Since there are thirty-seven pixel blocks in most strips, totransfer an entire strip, seven full WBC, each having five pixel blocks,are transferred. An eighth, partial, working block, having a single fullpixel block, and a single partial pixel block, is also transferred.Because the block addressers and the buffer controllers transferthirty-two bit words, all pixel block row dimensions, and thereforeworking block row dimensions, are multiples of thirty-two bits.

In FIG. 10, an exemplary set of WBC 580 of image data is shown. Becauseimage data is transferred in working blocks, all strips of the imagedata are transferred in the same manner. More generally, the blockaddressers and buffer controllers are not limited to sixteen-by-sixteenpixel blocks. The block addressers and buffer controllers are designedto operate with any number N of rows and columns (M) per pixel block,and any number (P) of pixel blocks per strip.

When transferring data, the buffer controllers read or write an entireworking block of image data at a time, starting from the top left bank(bank zero, pixel block row zero, local buffer bank zero), traversingthe entire working block row, then continuing with the first pixel ofthe second row (pixel block row one, local buffer column zero). Theblock addresser is programmed to generate addresses to fetch the datafrom the external memory in the aforementioned order. Referring to FIG.11, an alternate arrangement of working blocks of image data is shown.Both the rows and the columns of the working blocks overlap.

In FIG. 12A, an alternate arrangement of pixel blocks of a WBC is shown.The pixel blocks 570 of the WBCs are shaded. In this example, the pixelblocks of the WBCs are not adjacent in a column but staggered.

In FIG. 12B, another alternate arrangement of the pixel blocks is shown.The pixel blocks 570 of the WBCs are shaded. In this example, the pixelblocks are not adjacent, and are dispersed throughout the image data andnot in a column.

In FIG. 13, the various components of the image transform processor isshown. The buffer owner register 600 and buffer next owner register 610are shown. In the buffer owner register 600 and buffer next ownerregister 610, a set of buffer owner bits that designate the owners ofbuffer level zero 612, one 614, two 616 and three 618 are shown. For thebuffers at each level, a multiplexor 620 receives the buffer owner bitsfrom the buffer owner register 600 and the buffer next owner register610. For each level, a toggle bit 622 connected to the select line of arespective multiplexor 620 selects the specified set of owner bits. Thetoggle bit 622 is set by the buffer controllers. A semaphore system isused to determine which device has ownership and when to switch theownership of a particular buffer level. The RISC loads the buffer ownerregister 600 and the buffer next owner register 610.

Buffer Controller

In FIG. 14, each buffer controller 444 and 448 has a synchronous memoryinterface with data-request-acknowledge handshaking and a thirty-two bitdata bus. The buffer controllers 444 and 448 access the local buffers ina preconfigured sequence until a preconfigured limit is reached. Theinput buffer controller 444 accesses the local buffers in levels zeroand one. The output buffer controller 448 accesses the local buffers inlevels two and three.

The buffer controllers 444 and 448 supply address and control signals tothe local buffers, accesses the local buffers that are specified asowned by that buffer controller 444 or 448 by the buffer owner registerto read data from or write data to specified locations in the localbuffer. The buffer controller 444 and 448 utilize a set of cascadedcounters including an I-counter 630, a bank counter 632 and a J-counter634 to generate the control signals to cycle through the level of localbuffers specified by the buffer owner register as being owned by theinput buffer controller (IBC) 444. Each local buffer 500 is arranged inrows and columns. The I-Counter 630 generates a “pixel block column”signal that specifies a column the local buffer. The bank counter 632generates a “bank select” signal that specifies a particular bank oflocal buffers. The J-counter 634 generates a “pixel block row” signalthat specifies a row in the local buffer. In other words, a particularlocal buffer is specified by the buffer owner register and the bankcounter 632. Within each local buffer, the I-counter 630 and J-counter632 select a particular column and row.

The clock input of the I-counter 640 is connected to data available;therefore, the I-counter 640 is incremented each time a word istransferred to the local buffer. To cascade the counters, the carry fromthe I-counter 630 is connected to the clock input of the bank counter632; and, the carry of the bank counter 632 is connected to the clockinput of the J-counter 634. For example, a local buffer and a row withinthat local buffer are specified; and the IBC 444 transfers data to eachcolumn in the specified row of the local buffer and then changes to thelocal buffer in the next bank. The IBC 444 continues to transfer dataacross the columns and change banks until the last bank is reached.After data has been transferred to the last bank, the IBC 444 incrementsthe J-Counter and transfers data to the next row. Because the IBC 444 isthe same as the output buffer controller 448 except for the signaling totransfer data to and from the local buffers, and connecting the clockinput of the I-counter 630 to “data taken” rather than “data available”.

In particular, the I-counter 630 counts pixel block columns to generatethe pixel block column select signal to select a particular columnwithin each local buffer 500. After the last pixel block column istransferred to the local buffer, the carry bit of the I-counter 630 isset. The bank counter 632 counts the banks and generates bank selectsignals to select a particular bank. In response to the carry bit fromthe I-counter 630, the bank counter 632 is incremented and selects adifferent bank. The J-counter 634 counts the rows of the pixel blocksand generates row select signals to select a particular row within eachlocal buffer. Each counter 630, 632 and 634 is associated with at leastone maximum count value register which determines when the correspondingcounter generates a carry and is reset to zero. The maximum count valueregisters will be described below.

A block counter 636 counts the number of working blocks in a strip togenerate a last working block in strip (LWBS) signal. A maximum blockcount register 638 specifies the number of working blocks within a stripfor the block counter 636. The block counter is incremented by an end ofworking block signal (EWB) that is output by the J-Counter. The blockcounter 636 is reset each time the maximum number of working blocks in astrip is reached.

A programmable I-counter increment register (I-Increment) 640 sets theincrement of the I-counter 630. The I-counter increment register 640determines the address offset between successive read or writes within apixel block row. The I-counter increment register 640 is usually setequal to one.

For the I-counter 630, a maximum I-count register, Max_I-Count, 642 setsthe number of words in a pixel block row for all full pixel blocks. AMaximum I-Last register 644 sets the number of words in pixel block rowfor the last pixel block of a strip to accommodate partially full pixelblocks. In response to the state of the end of working block signal, amultiplexor 646 supplies the value of the Max_I-Count register 642 orthe value of the Max_I_Last register 644 to the I-counter 630.

The J-counter 634 has a programmable increment value register,J-Increment, 652 which determines the address offset between the firstword of successive rows of a pixel block. A maximum J-count valueregister, Max_J-Count, 654 determines the offset between the first wordof the first row of a pixel block and the first word of the last row ofa pixel block.

The bank counter 632 has a maximum count value register, MaxBank, 656that determines the number of banks to transfer data to for each fullworking block. A second maximum count value register, MaxBank Last, 658determines the number of banks to transfer data to or from for the lastworking block of a strip. In response to the state of the last workingblock in a strip signal (LWBS), a multiplexor 660 supplies the value ofthe MaxBank register 642 or the value of the MaxBank Last register 644to the I-counter 630. In this way, a subset of the banks can be used forthe last working block in a strip.

For example, in one implementation, the buffer controller 444 countersettings are determined as follows:

-   -   I-Increment 640, the address increment between successive words        of each pixel block row, is set to one.    -   Max_I-Count 642 is set equal to the number of words per pixel        block minus one.    -   Max_I_Last 644 is set equal to the number of words per row for        the last pixel block of a strip minus one. If all pixel blocks        have the same size, then the value in Max_I_Last 644 is set        equal to the value in Max_I-Count 642.    -   J-Increment 652 is set equal to the offset, in words, within an        ITP buffer between consecutive pixel block rows. The value in        J-Increment 652 is usually set equal to the value in Max_I-Count        642 plus one. When transferring one-dimensional data, the values        of J-Increment 652 and Max_J-Count 654 can be set equal to zero,        resulting on a single row of data being transferred to each        bank.    -   Max_J-Count 654 is equal to the offset, in words, of the first        word of the last row of a pixel bloc. The value of Max_J-Count        654 is usually set as follows:        Max_(—) J-Count=(the number of working block rows−1)*J_Increment        652.    -   MaxBank is equal to the last bank to be loaded for all but the        last working block of each strip. This will usually be equal to        the number of active pixel blocks minus one if non-convolutional        algorithms are being used, or the number of active pixel blocks        if convolutional algorithms are being used, to provide the        boundary data for the last active pixel block.    -   MaxBankLast is equal to the last bank to be loaded for the last        working block of each strip. If the number of pixel blocks per        strip is exactly divisible b the number of active pixel blocks,        the value of MaxBankLast will be equal to MaxBank, otherwise,        the value of MaxBankLast is determined as follows:        int(((# pixel blocks per strip) mod(# number of Active pixel        blocks))−1).    -   MaxBlock is equal to the number of full or partial working        blocks per strip minus two. The value of MaxBlock is determined        as the integer result of:        $\frac{\left( {{WordsPerImageRow} - {WordsPerWorkingBlockRow} - 1} \right)}{{Words}\quad{PerWorkingBlock}\quad{Row}}$

A buffer controller begins a sequence to transfer image data when it isenabled and has been given ownership of a buffer level by the bufferowner register 600 (FIG. 13). When a buffer controller complete a datatransfer sequence on a particular level of local buffers, the buffercontroller generates a control signal to toggle a toggle bit 622connecting to the associated multiplexor for that level to toggle theowner of that level to the next owner register 610. The counters 630,632, 634, 636 and the associated registers 638, 640, 642, 644, 652, 654,656, 658 are loaded by the microprocessor (RISC).

Master Controller

In FIG. 15, the SIMD master controller 450 pipelined processor stages670 are shown. The SIMD master controller 450 supports arithmeticinstructions and many addressing modes by utilizing a very longinstruction word (VLIW). In the pipeline 670, each stage or phase has aregister that stores the portion of the VLIW 672 with the controlsignals for that stage and subsequent stages.

In phase zero 674, the master controller 450 fetches the VLIW from thecontrol store and places the VLIW in VLIW pipeline register zero 672-0.In phase one 676, the master controller 650 decodes the VLIW that wasretrieved in phase zero. Based on the instruction decode, the mastercontroller 450 broadcasts control signals and coefficients to all theprocessing elements simultaneously.

Two phases, phases two and three, are used to generate an effectiveaddress to access the local buffers. In phase two 678, a portion of theeffective address is determined from the decoded VLIW 672-2. In phasethree 680, the generation of the effective address is completed andsimultaneously broadcast to the local buffers. The effective address maybe an effective byte address. The buffer owner register, at least inpart, specifies which level of local buffers responds to the effectiveaddress.

In phase four 682, the VLIW 672-4 provides the control signals for amultiplexor/latch stage of each processing element (PE). Themultiplexor/latch stage supplies the inputs to an arithmetic stage ofeach PE. In phase five 684, the VLIW 672-5 provides the control signalsfor the arithmetic stage to perform a computation based on the inputsfrom phase four. In phase six 686, the VLIW 672-6 provides the controlsignals for a descale/write stage of the PE.

Each PE has read and write access of up to eight buffer blocks. For mostoperations, each PE operates on data in the local buffer in its ownbank. The PEs are also connected to the local buffers of the adjacentbank to the right to support horizontal filtering operations. Each phaseuses one clock cycle, and a portion of the VLIW 672 and the results ofthe previous stage are passed to the next stage.

In FIG. 16, a block diagram of the various components of the SIMD mastercontroller 450 of FIG. 4 is shown. In the SIMD master controller 450, aSIMD master controller processing unit 690 is coupled to three staticmemories: a SIMD program memory 692, a coefficient memory 694, and anaddress mapping look-up table or address (LUT) 696. The SIMD mastercontroller processing unit 690 also is coupled to an interrupt statusregister 698, an interrupt mask register 702, a control and status (CSR)register 704, counters 706, a stack 708, input pointer registers 710 andoutput pointer registers 712. The SIMD master controller processing unit690 provides local buffer byte addresses, and control signals andcoefficients to the PEs.

To minimize the width of the instruction word, the SIMD mastercontroller 450 uses a pointer configuration table and a descaleconfiguration table to provide a semi-dynamic way of supplyinginstructions with parameters. An input pointer configuration register714 is associated with each input pointer register 710. An outputpointer configuration register 716 is associated with each outputpointer register 710. The values in the pointer configuration registers714 and 716 specify the pointer type, the buffer level and the countersassociated with that pointer.

A three-bit field in each VLIW 672 (FIG. 6), allows the programmer toselect a descale configuration for each arithmetic instruction. Descaleconfigurations in the descale register specify the upper and lowerbounds check values, absolute value selection and other descalingparameters.

The SIMD master controller processing unit 690 executes with an imageprocessing procedure 720 that is stored in the SIMD program memory 692.The microprocessor (RISC) loads the image processing procedure 720 intothe program memory 692 via the program storage data port 722.

The microprocessor (RISC) processor can read from and write to theinterrupt status register 698, the interrupt mask register 702, thecontrol and status (CSR) register 704, the coefficient memory 694 andthe address look-up table 696. The RISC processor stores data in thecoefficient memory 694 and the address LUT 696 via the coefficientstorage data port 724 and Z-LUT data port 726, respectively. Theinterrupt status register 698 is a read/clear register that indicatesthe status of each of the interrupt bits. The interrupt bits are maskedby respective bits in the interrupt mask register; and the unmaskedinterrupt bits are “ORed” to form the interrupt request. The interruptbits are readable by the RISC processor and are cleared by writing azero. The interrupt bits are defined as follows:

-   -   BUFF_IRQ: buffer interrupt request;    -   IPTC_HALTED: ITP Master controller 450 halted;    -   OBA_DONE: Output block addresser done;    -   IBA_DONE: Input block addresser done;    -   HUFF_ERR: Huffman encoder/decoder error; and    -   HUFF_DONE: Huffman encoder/decoder done.

The interrupt mask register 702 stores the interrupt mask bits for theinterrupt status register 698. The microprocessor (RISC) can read andwrite each of these bits. A value of one causes the interrupt to bemasked. On power-up, the interrupt mask bits are set to one to disableall interrupts.

The CSR 704 has a HALT bit that the microprocessor (RISC) clears or setsto start or stop the ITP. The CSR 704 also has a five bit processingelement enable field. Each bit in the processing element enable fieldenables the corresponding processing element when set to one, anddisables the corresponding processing element when set to zero. The CSRalso includes an instruction pointer which points into the SIMD programmemory 692.

Effective Address Generation

In FIG. 17, a block diagram of the pointer configuration sets for theinput pointers 710 is shown. In phases two and three of the pipelinestages, the SIMD master controller 450 generates and “effectiveaddress”. The pointers 710 and pointer configuration sets 714 arepreconfigured such that, when referenced by a VLIW instruction, aneffective address generation circuit (FIG. 18) generates the effectiveaddress in one clock cycle. Such a configuration provides complexaddressing that is performed in a short time.

Each input pointer register 710 is associated with an input pointerconfiguration register 714. Each pointer can be loaded with a nine-bitbase address. Bits one and zero from the VLIW enable the pointerselection multiplexor 732 and the and the pointer configurationselection multiplexor 734 to select a specified pointer register 710 andpointer configuration register 714. The pointer selection multiplexor732 supplies the value stored in the specified pointer register 710 tothe effective address generation logic, described below.

The value in the specified pointer configuration register 714 that isoutput from the pointer configuration selection multiplexor 734 enablesa pointer set multiplexor 738 to select one of the predefined pointerconfiguration sets 736 to used in the effective address generationcircuit. The predefined pointer configuration sets are registers thatinclude and specify the following fields:

-   HCNTRSEL[1:0]: a horizontal counter selection signal in the selected    predefined pointer configuration set that selects one of four    counters 732 of FIG. 18 as the horizontal counter;-   VCNTRSEL[1:0]: a vertical counter selection signal in the selected    predefined pointer configuration set that selects one of o of the    specified pointer selects one of the four counters 732 of FIG. 18 as    the horizontal counter;-   HDIMEN[9:0]: defines a nine-bit horizontal dimension;-   BYTE/SHORT: defines the format of the specified pointer as either a    byte or a short integer. Latch 739 a stores the BYTE/SHORT bit for    use in subsequent stages;-   SIGNED/UNSIGNED: defines the format of the specified pointer as    either signed or unsigned;-   BUFLVL[1:0]: selects one of the buffer levels. A BUFLVL latch 739 b    stores the BUFLVL[1:0] bits for use in subsequent stages.    Latch 739 c latches the value of IPTR[9:0] for use in subsequent    stages.

In FIG. 18, utilizing the control signals and the pointer of FIG. 17,the effective address generation circuit 750 is shown. The effectiveaddress generation circuit 750 generates the effective address to accessthe local buffers. As shown using the following C pseudo-code, theeffective address generation circuit 750 provides the followingaddressing modes:

-   -   *Ptr++: This addressing mode increments the value of specified        pointer by one. The “*” indicates that the incremented pointer        will be used as the address to access a desired location in the        local buffers.    -   *((Ptr++)+CTR): This addressing mode increments the specified        pointer and add a value in a specified counter (CTR) to the        incremented pointer.    -   *((Ptr++)+CTR+offset): This addressing mode is the same as the        previous addressing mode except that an offset is added to the        value of the specified pointer in addition to the value in the        counter. The offset is provided as a field (HOFF[3:0],        VOFF[3:0]) in the VLIW instruction.    -   *((Ptr++)+ZLUT(CTR)+offset): The Z-look-up table addressing mode        is the same as the previous addressing mode.    -   *(2DPtr): This is a two-dimensional addressing mode and will be        discussed below with respect to the effective address generation        circuit 750.

For two dimensional addressing, the image processing procedure 750 willstore a vertical count value (vcounter) in the one of the counters 742and a horizontal count value (hcounter) in another counter 742. Theimage processing procedure 750 will also store a predefinedconfiguration settings in one of the pointer configuration registersthat specifies horizontal dimension (hdimension), the horizontal counterselection (HCNTRSEL) bits, the vertical counter selection (VCNTRSEL)bits, byte addressing (BYTE), unsigned addressing (UNSIGNED) and thebuffer level (BUFLVL). The image processing procedure will then store abase pointer value in one of the pointer registers 710 and a pointerconfiguration value in the associated pointer configuration register.The pointer configuration value specifies which of the pointer setconfiguration registers to use. For example, if the value in thespecified pointer configuration register is equal to two, multiplexor738 will provide the fields from pointer set configuration register two(PCFG2) to the effective address generation circuit. After definingthese initial conditions, instructions may be executed that perform thetwo-dimensional addressing. In this way, by changing the horizontal andvertical offset in the instructions image data in the local buffers canbe accessed in a complex and efficient manner.

The VLIW instruction has fields that specify a horizontal offset(hoff[3:0]) and a vertical offset (voff[3:0]). The effective addressgeneration circuit 750 generates the effective address using thefollowing relationship:Effectiveaddress=Ptr+((vcounter+voffset)*hdimension)+(hcounter+hoffset).

To generate the two-dimensional effective address, the VLIW instructionincludes a field that sets the value of the IPTR[1:0] bits to select oneof the pointers and pointer configuration registers, a field thatspecifies the horizontal offset (hoff[3:0]) and another field thatspecifies the vertical offset (voff[3:0]).

The vertical and horizontal selection signals, VCNTRSEL[1:0] andHCNTRSEL[1:0], cause counter selection multiplexors, 744 and 746,respectively, to output the value stored in the selected counter. Anadder 748 adds the vertical offset (voff[3:0]) to the value of theselected vertical counter. A multiplier 750 multiplies output of theadder 748 by the value of the horizontal dimension (HDIMEN[9:0]). Fortwo-dimensional addressing, a first 2-D bit 750 will be equal to one toallow the AND gate 752 to provide the output the output of themultiplier. For other than two-dimensional addressing, the first 2-D bit750 is set equal to zero and provides an output of zero.

Meanwhile, another AND gate 754 performs an AND operation between thevalue of selected horizontal counter from multiplexor 746 and a2-D/counter bit 755. Since the 2-D bit is equal to one fortwo-dimensional addressing and for *(Ptr+Ctr) addressing, AND gate 754provides the value of the selected horizontal counter. When the 2-Dbit/counter bit 755 is equal to zero, AND gate 754 outputs a zero.Another adder 756 adds the value of the selected horizontal counter tothe horizontal offset (HOFF[3:0]). Multiplexor 758 is enable to pass theoutput of adder 756 to another adder 760 which outputs the followingresult:((vcounter+voffset)*hdimension)+(hcounter+hoffset).

A pointer mode bit 762 is set equal to one, and AND gate 764 allows theoutput of adder 760 to be stored in latch 759 a. Another latch 759 blatches the output of multiplexor 758. The outputs of latches 759 a and759 b are supplied to adder 766. Adder 766 adds the specified baseaddress from the specified pointer register 710 to the output of adder760 to generate the two-dimensional effective address. Multiplexor 768supplies the two-dimensional effective address as a byte address to thelocal buffers in response to BYTE/SHORT bit of the selected pointer setconfiguration register 736.

Shifter 769 shifts the output of adder 766 up by one, effectivelymultiplying the output of adder 766 by two. The multiplexor 768 outputsthe result of the shifter 769 when the BYTE/SHORT bit indicates short.

In an alternate embodiment, latch 762 and “AND-gate” 764 are not used;and the output of adder 760 is supplied directly to adder 766.

Because two cycles, phases two and three, are used to generate theeffective address, latches 759 a and 759 b store intermediate results ofthe effective address generation of phase two, for use in phase three.At the end of phase three, an incrementer 770 increments the value ofIPTR[9:0] pointer by 1. The incrementing is performed during phases fouror five and the incremented value is stored back in the specified IPTRregister.

Convolutional Filters

Detection of the condition ((hcounter+hoffset)>hdimension) in a giveninstruction for a given 2-dimensional pointer causes the SIMD mastercontroller 450 to direct all the SIMD local buffer accesses to the nexthigher bank. When this condition occurs during read operations,processing element zero fetches data from the local buffers of bank one,and more generally, processing element N fetches data from bank N+1.When this condition occurs during write operations, processing elementzero writes data to the local buffers of bank one, and more generally,processing element N writes data to bank N+1. A comparator 770 receivesthe value (hcounter+hoffset) from adder 756 and compares it to the valueof HDIMEN[9:0] to generate a comparison signal that indicates when thecondition, ((hcounter+hoffset)>hdimension), is true.

Z-Look-Up Table Mapped Addressing

The two-hundred fifty-six by eight-bit look-up table 696 provides anadditional addressing mode that can be selected by the pointer setconfiguration registers. The z-look-up mode allows the local buffers tobe accessed by the image processing procedure in any preconfiguredorder. For example, the z-look-up mode can be used for the JPEG zig-zagsort or any other address mapping of eight bit. In the z-look-upaddressing mode, the effective address is determined by the followingrelationship: Effective Address=Ptr+ZLUT[counter]. When the Z-LUTselection signal indicates that the Z-look-up mode is enabled,multiplexor 772 outputs a value from the Z-look-up table 696 asspecified by the selected counter 742. Adder 774 adds the value from theZ-LUT 696 to the output of adder 756.

SIMD Processing Element

In FIG. 19, the multiplexor/latch stage 682 (FIG. 15) of phase four inan exemplary SIMD processing element is shown. A local buffermultiplexor 802 selects an output from one of the local buffers based onBUFLVL[1:0] (FIG. 17) and the out-of-bounds signal from FIG. 18. Whenthe out-of-bounds signal is set, the local buffer multiplexor 802selects one of the local buffers from the adjacent bank. In FIG. 19, thelocal buffer multiplexor for bank 0 is shown; the other local buffermultiplexors for the processing elements in the other banks areconfigured in the same manner.

The VLIW in phase 4, a source one multiplexor 804 may supply the datafrom one of the local buffers, data from look-up table zero (LUT0), datafrom look-up table one (LUT1), data from register two of the processingelement (REG2), data from register three (REG3) of the processingelement, a zero input, data from a processing element (PEID), or from adescale accumulator (DESACC) to a source 1 latch 806.

In the VLIW in phase 4, a source two multiplexor 808 may supply the datafrom one of the local buffers, data from look-up table zero (LUT0), datafrom look-up table one (LUT1), data from register two of the processingelement (REG2), data from register three (REG3) of the processingelement, a zero input, data from a processing element (PEID), or fromthe coefficient memory to the source 2 latch 810.

In FIG. 20, in an exemplary processing element arithmetic stage 684 ofphase five of FIG. 15 is shown. A first arithmetic logic unit (ALU)multiplexor (ALU MUX 1) 820 supplies either the value from processingelement register zero (REG0), processing element register one (REG1) orthe source one latch 806 (Source 1) (FIG. 19) to a logic functioncircuit 822, a selector 824, an adder 826 and a multiplier 828. A secondALU multiplexor (ALU MUX 2) 830 supplies either the value fromprocessing element register zero (REG0), processing element register one(REG1) or the source two latch (Source 2) 810 (FIG. 19) to the logicfunction circuit 822, the selector 824, the adder 826 and the multiplier828.

The output of the selector 824, the adder 826 and the multiplier 828 issupplied to an ALU descaler 830 in accordance with the VLIW. The ALUdescaler 830 will be described below. An adder 832 adds the output fromthe ALU descaler 830 to the value stored in the accumulator 834, ifspecified in the VLIW, and supplies the sum to the descale/write stage686 (FIG. 15).

In the logic function circuit 822, a true/false signal is generatedbased on a selected function that is applied to the outputs of the firstand second ALU multiplexors, 820 and 830, respectively, in accordancewith the VLIW. The selected functions include a greater than function, aless than function, an equals function, and the logical AND, OR andexclusive-or (XOR) functions.

The true/false signal output by the logic function circuit 822 issupplied to a boolean accumulator 834. In the boolean accumulator 834,the true/false signal is supplied to another logic function generator836. Flag bit zero (FLAG 0) and Flag bit one (FLAG 1) are also into tothe logic function generator 836. The logic function generator 836stores the result of the specified logic operation in FLAG 0 or FLAG 1in accordance with the VLIW. The logic function generator 836 alsostores the result of the specified logic operation in a conditionalwrite bit 842 which is supplied to the descale/write stage. The logicfunction generator 836 has circuits that perform any of the followinglogic operations in accordance with the VLIW: AND, OR, XOR, NOT, andSELECT.

In FIG. 21, the processing element descale write stage 686 of phase six,of FIG. 15 is shown. The output from the accumulator 834 (FIG. 20) issupplied to an accumulator (ACC) descaler 850 before being stored in oneof the processing element registers: register zero (REG0) 852, registerone (REG1) 854, register two (REG2) 856 or register three (REG3) 858 inaccordance with the VLIW. The output from the ACC descaler 850 can alsobe stored in the local buffers in accordance with the VLIW.

Each processing element includes a two-hundred fifty-six by sixteenlook-up table 860 in memory. The look-up table 860 is divided into twoeight-bit tables, look-up table A (LUTA) and look-up table B (LUTB),each of which makes it possible to look up or transform an eight-bitvalue to any other eight-bit value. When LUTA or LUTB are selected asdestinations for a given arithmetic instruction, selection of the ALUdata bits that address the look-up table is determined by the descaleconfiguration set specified in the instruction. The eight-bit lookuptable result can be signed or unsigned depending on the sourcespecified. The RISC loads the look-up table 861 via LUT data port 861.

In FIG. 22, the ACC descaler 850 of FIG. 21 is shown. The microprocessor(RISC) or SIMD master controller 450 configures at least one of eightdescale configuration set registers 870 b. The VLIW supplies a value toa descale configuration register 870 which causes a multiplexor 860 c tooutput the values stored in the selected descale configuration setregister to supply the control signals to a shifter 872 and a roundingcircuit 874. A multiplexor 876 output either the rounded value from therounding circuit 874 or the absolute value from the absolute valuecircuit 878.

Lower and upper comparators, 880 and 882, compare the output ofmultiplexor 878 to the values in the lower bound descale register 884and the upper bound descale register 886, respectively. In accordancewith the VLIW and the result of the comparison, the value in the loweror upper descale register, or the output of multiplexor 878 is output.

In FIG. 23, in an ALU descaler 839 is shown. T ALU descaler descales thevalues output from the selector 824, the adder 826 and the multiplier828 in response to the VLIW. The absolute value circuit 894 operates inaccordance with one of the descale configuration set registers 870 bspecified by the descale configuration register 870 a. The multiplexor870 c outputs the values of the selected descale configuration setregister. A shift right circuit 892 shifts the output of the selector824. An absolute value circuit 894 provides the absolute value of thevalue output by the adder 826. A shift left circuit 896 shifts eitherthe value output by the adder or the value output by the absolute valuecircuit 894 in accordance with the VLIW.

An arithmetic shift right circuit 898 shifts the value output by themultiplier in accordance with the VLIW. A rounding circuit 900 roundseither the value output by the arithmetic shift right circuit 898 or thevalue output by the multiplier in accordance with the VLIW.

A multiplexor 902 supplies the output from the shift right circuit 892,the shift left circuit 896 or the rounding circuit 900 to theaccumulator in accordance with the VLIW. The shift left circuit 896 androunding circuit 900 include multiplexors that are responsive to thecontrol signals from the descale configuration set register to select aspecified one of the two inputs.

In FIG. 24, a general topology of the SIMD master controller 450 of FIG.4 is shown. In this architecture, any number N of banks can be added,making the architecture expandable by adding additional banks withprocessing elements.

In FIG. 25, a general topology of the SIMD master controller of FIG. 24with a second SIMD master controller. Additional levels of local buffersare added between the master controllers allowing the master controllersto exchange data using those local buffers. Each master controller hasthe architecture shown in FIG. 16.

SIMD Instruction Set

The arithmetic and logical instructions use a VLIW having a fixedformat. Table three below shows the format of the VLIW. Line one showsthe fields and line 2 shows the number of bits in that field. The dashes“---” indicate that those bits are shared between the adjacent fields.TABLE 3 VLIW for SIMD processor SOURCE SOURCE OP OP (for (for OPCODEALUOP DESCFG DESTOP — Source 2) Source 1) 6 5 3 8 3 7 10

The VLIW supports and operation code (OPCODE), ALU operation select(ALUOP), a descale configuration select (DESCFG), a destination operand(DESTOP), three shared bits, a source operand (SOURCE OP) for Source 2and a source operand (SOURCEOP) for Source 1.

The ALUOP field selects the ALU operation to be performed and is encodedas shown in table four below. In table four, ADEST indicates thatarithmetic destination operations are used, not the binary accumulatoroperands (BACCA or BACCB). DEST indicates any destination operand—botharithmetic and boolean. The source one and source two operands aredesignated as SRC1 and SRC2, respectively. Th syntax of the operation isshown using C-pseudo-code. Some ALU opcodes (ALUOP) perform twooperations simultaneously. The {circumflex over ( )}{circumflex over( )} operand performs a squaring operation on source operand one. Anyvalue source one operand may be used for squaring except LS1. TABLE 4Arithmetic Instructions ALUOP OP TYPE SYNTAX 00000 ARITH ADEST=SRC1 *SRC2 ARITH ADEST+=SRC1*SRC2 00001 ARITH ADEST=SRC1 + SRC2 ARITHADEST+=SRC1 − SRC2 00010 ARITH ADEST=SRC1 − SRC2 ARITH ADEST+=SRC1 −SRC2 00011 ARITH ADEST=SRC1

2 ARITH ADEST+=SRC1

2 01100 BITWISE ADEST = SRC1 & SRC2 01101 BITWISE ADEST = SRC1 | SRC201110 BITWISE ADEST = SRC1

SRC2 10000 SELECT ADEST=BACCA ? SRC1 :SRC2 CONDITIONAL ADEST = SRC1 IFBACCA 10001 SELECT ADEST=BACCB ? SRC1 :SRC2 CONDITIONAL ADEST = SRC1 IFBACCB 10010 SELECT ADEST=!BACCA ? SRC1 :SRC2 CONDITIONAL ADEST = SRC1 IF!BACCA 10011 SELECT ADEST=!BACCB ? SRC1 :SRC2 CONDITIONAL ADEST = SRC1IF !BACCB 10100 SELECT ADEST=!BACCA && !BACCB ? CONDITIONAL SRC1 :SRC2ADEST = SRC1 IF !BACCA && !BACCB 10101 SELECT ADEST=!BACCA && BACCB ?CONDITIONAL SRC1 :SRC2 ADEST = SRC1 IF !BACCA && BACCB 10110 SELECTADEST=BACCA && !BACCB ? CONDITIONAL SRC1 :SRC2 ADEST = SRC1 IF BACCA &&!BACCB 10111 SELECT ADEST = BACCA && BACCB ? CONDITIONAL SRC1 :SRC2ADEST = SRC1 IF BACCA && BACCB 11000 SELECT DEST= SRC1 > SRC2 ? SRC1:SRC2 CONDITIONAL ADEST = SRC1 IF SRC1 > SRC2 11001 SELECT DEST= SRC1 <SRC2 ? SRC1 :SRC2 CONDITIONAL ADEST = SRC1 IF SRC1 < SRC2 11010 SELECTDEST= SRC1 >= SRC2 ? SRC1 :SRC2 CONDITIONAL ADEST = SRC1 IF SRC1 >= SRC211011 SELECT DEST= SRC1 <= SRC2 ? SRC1 :SRC2 CONDITIONAL ADEST = SRC1 IFSRC1 <= SRC2 11100 SELECT DEST= SRC1 == SRC2 ? SRC1 :SRC2 CONDITIONALADEST = SRC1 IF SRC1 == SRC2

Select operations are performed based on the result of the comparison ortest, with source operand one being written to the destination if theresult of the comparison or test is true, and source operand two beingwritten to the destination if the result of the comparison or test isfalse. The arithmetic accumulator, ACC, is always updated with theresult of the select. When any select operation is performed with aboolean accumulator as the destination, the boolean result of thecomparison is sent to the specified accumulator.

A conditional write mode is provided for all comparison and testoperations. Conditional writes operate in exactly the same manner as theselect operations described above, except that if the result of thecomparison or test is false, then the output operand is not written.However, the arithmetic accumulator is always written, exactly as itwould have been for the corresponding select operation. This allowsif-else and case constructs to be built using a sequence of ifs. Theconditional write instructions are encoded identically to their selectcounterparts, except for the opcode field.

Each processing element has a sixteen-by-sixteen signed multiplier, anaccumulator, a four-element register file, a sixteen bit comparator, andtwo LUTs for data mapping. All the processing elements execute the sameinstruction simultaneously in lock-step. Additionally, the imagetransform processor can be implemented as an integrated circuit, or theprocessing elements can be implemented using discrete components.Although the image transform processor has been described for use withan exemplary electronic digital still camera, the image transformprocessor can be used with a variety of electronic digital videocameras, scanners and printers. In addition, the present invention canbe used with portable electronic devices having an image sensor such asa personal digital assistant (PDA).

In FIG. 26, a flow diagram of an exemplary image transform process ofthe image transform processor 206 of FIG. 4 is shown. A image is createdby a device such as a digital camera with a CCD or a digital imagelocated in a memory or storage device and made available as and inputimage at the start of the image transform process 2600. A first portionof the input image is provided to a buffer in a plurality of buffers2602. The plurality of buffers are locations in memory that act astemporary storage. A first processing operation is performed on thefirst portion of the input image resulting in a first processed imagedata portion 2604. An example of a first processing operation isuncompressing or formatting the first portion of the input image. Thefirst processed image data portion is stored in a second buffer in theplurality of buffers 2606.

A second portion of the input image is provided in the first buffer2608. The first portion of the input image is written over by the secondportion of the input image or erased prior to being provided to thefirst buffer. A second processing operation is performed on the firstprocessed image data portion resulting in a second processed image dataportion 2610. An example of a second processing operation is to adjustthe color contrast of the first processed image data portion. The secondprocessed image data portion is stored in a third buffer of theplurality of buffer 2612. The first processing operation on the secondportion of the input image is performed resulting in a third processedimage data portion 2614. The first processing operation and the secondprocessing operation are shown occurring linearly in time. In alternateembodiments the order of the first processing operation and secondprocessing operation may occur in any order once input image portionsare available. In a preferred embodiment both operations occursimultaneously.

The third processed image data portion is stored in the second buffer2616 and the second processed image data portion is provided on a datapath as output image data 2618. If additional input image portions areavailable 2620, then processing continues at 2602. If no additionalinput image portions are available 2620, then processing is complete2620 and the output image has been transformed from the input image.

The programmable image transform processor may also be implemented insoftware. Modeling the activities of a microprocessor in software isgenerally known by those skilled in the art. Therefore, an exemplaryimplementation of the programmable image transformation processor mayalso be modeled in software using machine readable instructions. Anembodiment of the method steps employs at least one machine-readablesignal bearing medium having machine-readable instructions. Examples ofmachine-readable signal bearing mediums include computer-readablemediums, such as a magnetic storage medium (i.e. floppy disks, oroptical storage like a compact disk (CD) or digital video disk (DVD)), abiological storage medium, or an atomic storage medium, a discrete logiccircuit(s) having logic gates for implementing logic functions upon datasignals, an application specific integrated circuit having appropriatelogic gates, a programmable gate array(s) (PGA), a field programmablegate array (FPGA), a random access memory device (RAM), read only memorydevice (ROM), electronic programmable random access memory (EEPROM), orequivalent. Note that the computer-readable medium could even be a paperor another suitable medium upon which the computer instruction isprinted, as the program can be electronically captured, via for instanceoptical scanning of the paper or other medium, then compiled,interpreted or otherwise processed in a suitable manner if necessary,and then stored in a computer memory.

Additionally, machine-readable signal bearing medium includescomputer-readable signal bearing mediums. Computer-readable signalbearing mediums have a modulated carrier signal transmitted over one ormore wire based, wireless or fiber optic networks or within a system.For example, one or more wire based, wireless or fiber optic network,such as the telephone network, a local area network, the Internet, or awireless network having a component of a computer-readable signalresiding or passing through the network. The computer readable signal isa representation of one or more machine instructions written in orimplemented with any number of programming languages.

Furthermore, the multiple process steps implemented utilizing aprogramming language, comprising an ordered listing of executableinstructions for implementing logical functions, can be embodied in anymachine-readable signal bearing medium. The ordered listing ofexecutable instructions for implementing logical functions utilize by orin connection with an instruction execution system, apparatus, ordevice, such as a computer-based system, controller-containing systemhaving a processor, microprocessor, digital signal processor, discretelogic circuit functioning as a controller, or other system that canfetch the instructions from the instruction execution system, apparatus,or device and execute the instructions.

While various embodiments of the invention have been described, it willbe apparent to those of ordinary skill in the art that many moreembodiments and implementations are possible that are within the scopeof this invention.

1-46. (canceled)
 47. An image processor for processing image data in asystem, said system having a microprocessor, said image processorcomprising: an arithmetic processing block including a buffer array andan N number of processing elements, said buffer array having an M numberof rows and an N+1 number of columns, each of said N+1 number of columnsdefining each of an N+1 number of memory banks, wherein each of said Nnumber of processing elements is associated with a corresponding one ofsaid N+1 number of memory banks, and wherein (N+1)th memory bank of saidN+1 number of memory banks provides boundary data for Nth processingelement of said N number of processing elements; a controller configuredto control said N number of processing elements, said controller beingin communication with said microprocessor of said system.
 48. The imageprocessor of claim 47, wherein (N+1)th memory bank of said N+1 number ofmemory banks provides said boundary data for said Nth processing elementof said N number of processing elements while convolutional algorithm isbeing applied to said image data.
 49. The image processor of claim 47,wherein said controller reads said image data in one of said N+1 numberof memory banks, performs computation on said image data, and storessaid images data in a different one of said N+1 number of memory banks.50. The image processor of claim 47, wherein said controller is a SIMDcontroller and simultaneously controls said N number of processingelements.
 51. The image processor of claim 47 further comprising aprogrammable input addresser for transferring said image data from adata source to said arithmetic processing block by providing a sourceaddress onto a source address path, said source address identifying saiddata source.
 52. The image processor of claim 47 further comprising aprogrammable output addresser for transferring said image data from saidarithmetic processing block to a memory by providing a write addressonto a write path, said write address identifying a write address insaid memory for storage of said image data.
 53. The image processor ofclaim 47 further comprising a Huffman encoder for encoding said imagedata received from said controller to generate an encoded image data,and a Huffman decoder for decoding said encoded image data.
 54. Theimage processor of claim 53 further comprising one or more Huffmancontrol registers for use by said Huffman encoder and said Huffmandecoder.
 55. An image processor for processing image data in a system,said system having a microprocessor, said image processor comprising: anarithmetic processing means for processing said image data, saidarithmetic processing means including a buffer array and an N number ofprocessing elements means for controlling said buffer array, said bufferarray having an M number of rows and an N+1 number of columns, each ofsaid N+1 number of columns defining each of an N+1 number of memorybanks, wherein each of said N number of processing elements means isassociated with a corresponding one of said N+1 number of memory banks,and wherein (N+1)th memory bank of said N+1 number of memory banksprovides boundary data for Nth processing element means of said N numberof processing elements means; a controller means for controlling said Nnumber of processing elements, said controller being in communicationwith said microprocessor of said system.
 56. The image processor ofclaim 55, wherein (N+1)th memory bank of said N+1 number of memory banksprovides said boundary data for said Nth processing element means ofsaid N number of processing elements means while convolutional algorithmis being applied to said image data.
 57. The image processor of claim55, wherein said controller means reads said image data in one of saidN+1 number of memory banks, performs computation on said image data, andstores said images data in a different one of said N+1 number of memorybanks.
 58. The image processor of claim 55, wherein said controllermeans is a SIMD controller and simultaneously controls said N number ofprocessing elements means.
 59. The image processor of claim 55 furthercomprising a programmable input addresser means for transferring saidimage data from a data source to said arithmetic processing means byproviding a source address onto a source address path, said sourceaddress identifying said data source.
 60. The image processor of claim55 further comprising a programmable output addresser means fortransferring said image data from said arithmetic processing means to amemory by providing a write address onto a write path, said writeaddress identifying a write address in said memory for storage of saidimage data.
 61. The image processor of claim 55 further comprising aHuffman encoder means for encoding said image data received from saidcontroller means to generate an encoded image data, and a Huffmandecoder means for decoding said encoded image data.
 62. The imageprocessor of claim 61 further comprising one or more Huffman controlregisters for use by said Huffman encoder and said Huffman decoder. 63.A method for use by an image transform processor for processing imagedata, the method comprising: receiving the image data, using aprogrammable arithmetic processor, from a data source over a data pathprocessing the image data, using the programmable arithmetic processor,wherein the programmable arithmetic processor comprises a first set oflocal buffers and a second set of local buffers; using each buffer inthe first set of local buffers alternately for fetching input imagedata; using each buffer in the second set of local buffers alternatelyfor storing output image data; and controlling transfer of the imagedata, using a programmable input addresser, from the data source to theprogrammable arithmetic processor by providing a source address onto asource address path, wherein the source address identifies the datasource.
 64. The method of claim 63 further comprising: controllingtransfer of the image data to the programmable arithmetic processor byproviding a storage address to the programmable arithmetic processor,wherein the storage address identifies a location within theprogrammable arithmetic processor for storage of the image data.
 65. Themethod of claim 63, wherein the data source being a frame captureprocessor, and the source address identifying the frame captureprocessor.
 66. The method of claim 63, wherein the data source being amemory, the source address being a memory address identifying a locationof the image data within the memory.
 67. The method of claim 63, whereinthe data source being a memory, the source address path being a readaddress bus coupled between the programmable input addresser and thememory, the source address being a memory address identifying a locationof the image data within the memory.
 68. The method of claim 63 furthercomprising: controlling transfer of the image data, using a programmableoutput addresser, from the programmable arithmetic processor to a memoryby providing a write address onto a write path, the write addressidentifying a write address in the memory for storage of the image data.69. The method of claim 68, wherein the write path is a write addressbus electrically connected to the programmable output addresser and thememory.
 70. The method of claim 68, wherein the programmable outputaddresser further controlling transfer of the image data by providing aretrieval address to the programmable arithmetic processor, theretrieval address identifying a location within the programmablearithmetic processor for retrieval of the image data.
 71. The method ofclaim 70, wherein the retrieval location within the programmablearithmetic processor is a buffer.
 72. The method of claim 70, whereinthe retrieval location within the programmable arithmetic processor isat least one buffer of a plurality of buffers.
 73. The method of claim63 further comprising: controlling transfer of the image data, using aprogrammable output addresser, from the programmable arithmeticprocessor to a memory by: (i) providing a write address onto a writeaddress bus coupled between the programmable output addresser and thememory, the write address identifying a write address in the memory forstorage of the image data, and (ii) providing a retrieval address to theprogrammable arithmetic processor, the retrieval address identifying alocal buffer within the programmable arithmetic processor for retrievalof the image data.